Downstream analysis pySCENIC¶

Downstream analysis using the loom file generated from this notebook

Extract relevant data from the integrated loom file¶

Using anndata¶

--> This might be very slow. Consider passing `cache=True`, which enables much faster reading from a cache file.
Create regulons from a dataframe of enriched features.
Additional columns saved: []

UMAP and knn graph from auc_mtx¶

computing PCA
    with n_comps=50
    finished (0:00:00)
computing neighbors
    using 'X_pca' with n_pcs = 40
    finished: added to `.uns['neighbors']`
    `.obsp['distances']`, distances for each pair of neighbors
    `.obsp['connectivities']`, weighted adjacency matrix (0:00:17)
computing UMAP
    finished: added
    'X_umap', UMAP coordinates (adata.obsm) (0:00:06)
Index(['ARID3A(+)', 'ARNTL(+)', 'ATF1(+)', 'ATF2(+)', 'ATF4(+)', 'ATF6(+)',
       'BACH1(+)', 'BACH2(+)', 'BCL11B(+)', 'BCL6(+)',
       ...
       'ZNF576(+)', 'ZNF580(+)', 'ZNF587(+)', 'ZNF600(+)', 'ZNF655(+)',
       'ZNF669(+)', 'ZNF680(+)', 'ZNF708(+)', 'ZNF770(+)', 'ZSCAN26(+)'],
      dtype='object', length=219)
normalizing counts per cell
    finished (0:00:00)
running Leiden clustering
    finished: found 3 clusters and added
    'leiden', the cluster labels (adata.obs, categorical) (0:00:00)
ranking genes
    finished: added to `.uns['rank_genes_groups']`
    'names', sorted np.recarray to be indexed by group ids
    'scores', sorted np.recarray to be indexed by group ids
    'logfoldchanges', sorted np.recarray to be indexed by group ids
    'pvals', sorted np.recarray to be indexed by group ids
    'pvals_adj', sorted np.recarray to be indexed by group ids (0:00:19)

Leiden clusters, DEG, BP and TF¶

Cluster 0¶

Top up-regulated genes

pvals_adj logfoldchanges abs_lfc scores 0
RACK1 0.000000e+00 1.529024 1.529024 46.862011 0.867624
CTSW 0.000000e+00 1.714248 1.714248 46.433117 0.834566
ACTB 0.000000e+00 1.352698 1.352698 46.423500 0.984776
FCER1G 0.000000e+00 2.062953 2.062953 44.366268 0.779759
RPS17 0.000000e+00 1.198917 1.198917 42.338680 0.913151
PFN1 0.000000e+00 1.037864 1.037864 40.129578 0.941424
EEF1G 0.000000e+00 1.320802 1.320802 40.103203 0.809482
ATP5F1E 0.000000e+00 1.268447 1.268447 37.858982 0.848485
RPL31 3.750411e-295 0.749093 0.749093 36.888947 0.950123
TMSB10 7.722950e-295 0.695977 0.695977 36.867626 0.982456

up- down- regulated pathways

Cluster 0 doesn't upregulates specific genes of immune checkpoint receptors related to NK exhaustion. The enriched pathways suggest the enrichment of immune processes

Cluster 1¶

Top up-regulated genes

pvals_adj logfoldchanges abs_lfc scores 1
MTRNR2L12 0.0 5.428245 5.428245 85.957787 0.985603
TPT1 0.0 1.528299 1.528299 73.922653 0.997876
METRNL 0.0 4.278031 4.278031 70.394096 0.842577
CLEC2B 0.0 2.772367 2.772367 64.237289 0.950673
JUND 0.0 3.219890 3.219890 64.022560 0.884116
LINC01578 0.0 3.653805 3.653805 62.212978 0.778145
LDHA 0.0 2.702229 2.702229 61.946045 0.938872
GNAS 0.0 2.926696 2.926696 61.386494 0.874675
HIST1H4C 0.0 3.066264 3.066264 61.276398 0.853198
HLA-A 0.0 1.099264 1.099264 60.495953 0.999056

Cluster 1 looks the opposite of Cluster 0, down regulation of immune related pathways and active cell transcripion. Could these cell be exhausted? NO division between exhausted and resident NK cells were found in the UMAP

Cluster 2¶

Top up-regulated genes

pvals_adj logfoldchanges abs_lfc scores 2
ATP5E 0.000000e+00 8.162164 8.162164 49.703804 0.602961
GNB2L1 0.000000e+00 7.683477 7.683477 47.699005 0.580579
ATP5L 0.000000e+00 7.687450 7.687450 37.945911 0.460399
ATP5G2 9.237418e-304 8.493318 8.493318 37.404438 0.452135
RPL7 5.495666e-272 0.595198 0.595198 35.389103 0.892218
RARRES3 3.587023e-260 7.555178 7.555178 34.610233 0.420110
C14orf2 2.127094e-249 7.573108 7.573108 33.884426 0.411157
SEPT7 3.287299e-247 8.204267 8.204267 33.734791 0.408058
TCEB2 1.044718e-233 7.623756 7.623756 32.798454 0.397727
RPL34 8.169294e-222 0.370979 0.370979 31.949730 0.949036

Cluster 2 upregulated more pathways related to protein localization and protein translation, it looks like the middle point between the first 2 clusters (as in the umap).

TFs for clusters¶

Regulon specificity scores (RSS) across clusters¶

   ARID3A_(+)  ARNTL_(+)  ATF1_(+)  ATF2_(+)  ATF4_(+)  ATF6_(+)  BACH1_(+)  \
2    0.394791   0.318480  0.357749  0.378298  0.353037  0.296966   0.348593   
0    0.460208   0.475935  0.538349  0.478939  0.525389  0.438195   0.501305   
1    0.337867   0.401619  0.379810  0.355069  0.410612  0.331232   0.438827   

   BACH2_(+)  BCL11B_(+)  BCL6_(+)  ...  ZNF576_(+)  ZNF580_(+)  ZNF587_(+)  \
2   0.342087    0.336559  0.354982  ...    0.397024    0.375913    0.247703   
0   0.470368    0.497121  0.356044  ...    0.507455    0.533577    0.319101   
1   0.401026    0.455151  0.389804  ...    0.319025    0.387931    0.235772   

   ZNF600_(+)  ZNF655_(+)  ZNF669_(+)  ZNF680_(+)  ZNF708_(+)  ZNF770_(+)  \
2    0.331257    0.272575    0.399379    0.517156    0.361058    0.339316   
0    0.418928    0.351432    0.525137    0.352200    0.220787    0.266013   
1    0.353573    0.250323    0.324060    0.280323    0.227524    0.216985   

   ZSCAN26_(+)  
2     0.280710  
0     0.393594  
1     0.282187  

[3 rows x 219 columns]
    mean  StDev  Ratio
2  0.347  0.048  0.207
0  0.463  0.082  0.491
1  0.365  0.072  0.302
ARID3A_(+) ARNTL_(+) ATF1_(+) ATF2_(+) ATF4_(+) ATF6_(+) BACH1_(+) BACH2_(+) BCL11B_(+) BCL6_(+) ... ZNF576_(+) ZNF580_(+) ZNF587_(+) ZNF600_(+) ZNF655_(+) ZNF669_(+) ZNF680_(+) ZNF708_(+) ZNF770_(+) ZSCAN26_(+)
MGUS_CD138nCD45p_2_ACTGCTCGTCTAGGTT-1 0 0 0 1 0 0 0 0 0 1 ... 1 0 1 1 0 0 1 0 0 0
MGUS_CD138nCD45p_2_CACATTTTCATAACCG-1 0 0 0 0 0 1 0 0 0 0 ... 0 1 0 1 0 0 0 0 0 0
MGUS_CD138nCD45p_2_CCAATCCGTGCAGACA-1 0 0 0 0 0 0 0 1 0 1 ... 0 0 0 0 0 0 1 0 0 0
MGUS_CD138nCD45p_2_GCATGTACAATCGAAA-1 1 0 0 0 0 0 0 0 0 1 ... 1 0 0 1 0 0 1 0 0 0
MGUS_CD138nCD45p_2_GCGAGAATCTGCGTAA-1 0 0 0 0 0 1 0 0 0 1 ... 0 0 0 0 0 0 1 0 0 1

5 rows × 219 columns

The specificity score for the clusters reach ~ 0.55, not super high but differently from previoous analyses the scores decrease meaningfully. Also the scores are in the rane of other examples provided by the group who developed scenic example. The heatmap also roughly cluster the clusters together. The AUC doesn't look like high, probably these specific regulons aren't highly expressed along the clusters, check:

Expression TFs cluster 2 along the umap
Expression TFs cluster 0 along the umap
Expression TFs cluster 1 along the umap

The TFs are indeed expressed in the different clusters, also looking at the TF in the cluster 2 maybe a more granular clustering would be useful. Now I proceed to save the TFs and regulons in a json file to export them:

TFs for cell types¶

Regulon specificity scores (RSS) across cell types¶

                          ARID3A_(+)  ARNTL_(+)  ATF1_(+)  ATF2_(+)  ATF4_(+)  \
CD56dimCD16+ NK cells       0.657145   0.685951  0.799759  0.695849  0.819322   
CD56brightCD16- NK cells    0.236673   0.223587  0.237522  0.229460  0.244489   
NK cell progenitors         0.170845   0.169895  0.170256  0.170300  0.171402   

                          ATF6_(+)  BACH1_(+)  BACH2_(+)  BCL11B_(+)  \
CD56dimCD16+ NK cells     0.531031   0.824237   0.683304    0.834986   
CD56brightCD16- NK cells  0.234234   0.236023   0.259407    0.226584   
NK cell progenitors       0.171323   0.170173   0.169722    0.169580   

                          BCL6_(+)  ...  ZNF576_(+)  ZNF580_(+)  ZNF587_(+)  \
CD56dimCD16+ NK cells     0.554315  ...    0.699939    0.830660    0.331016   
CD56brightCD16- NK cells  0.218650  ...    0.231419    0.245246    0.204168   
NK cell progenitors       0.169995  ...    0.170470    0.171334    0.170350   

                          ZNF600_(+)  ZNF655_(+)  ZNF669_(+)  ZNF680_(+)  \
CD56dimCD16+ NK cells       0.573349    0.376303    0.732088    0.547959   
CD56brightCD16- NK cells    0.218755    0.220135    0.239966    0.225025   
NK cell progenitors         0.168670    0.172754    0.170127    0.169853   

                          ZNF708_(+)  ZNF770_(+)  ZSCAN26_(+)  
CD56dimCD16+ NK cells       0.303017    0.323380     0.416029  
CD56brightCD16- NK cells    0.205915    0.215546     0.297538  
NK cell progenitors         0.171675    0.169546     0.168335  

[3 rows x 219 columns]
                           mean  StDev  Ratio
CD56dimCD16+ NK cells     0.682  0.175  0.938
CD56brightCD16- NK cells  0.232  0.014  0.060
NK cell progenitors       0.170  0.001  0.001

The specificity scores are quite different among the celltypes, CD56dim NK cells have the highest scores. The AUC looks better, especially for the NK CD56dim population

The specificity for the cell types looks odd, all the TF have a narrow range of specificity for each cell types, the scores resemble the different cell types ratios. Probably the RSS is biased by the big differences of proportions.

TFs for cell states¶

Regulon specificity scores (RSS) across cell states¶

              ARID3A_(+)  ARNTL_(+)  ATF1_(+)  ATF2_(+)  ATF4_(+)  ATF6_(+)  \
NK exhausted    0.463639   0.458321  0.497775  0.465179  0.503545  0.377034   
Others          0.276448   0.270407  0.280775  0.275992  0.276288  0.253112   
NK resident     0.430142   0.455180  0.484617  0.452651  0.495673  0.426380   

              BACH1_(+)  BACH2_(+)  BCL11B_(+)  BCL6_(+)  ...  ZNF576_(+)  \
NK exhausted   0.498114   0.434075    0.513515  0.420506  ...    0.466683   
Others         0.278188   0.271988    0.279047  0.265811  ...    0.280341   
NK resident    0.497253   0.494404    0.480839  0.383245  ...    0.453640   

              ZNF580_(+)  ZNF587_(+)  ZNF600_(+)  ZNF655_(+)  ZNF669_(+)  \
NK exhausted    0.506104    0.281226    0.401912    0.314387    0.479456   
Others          0.281104    0.223671    0.265765    0.240490    0.281403   
NK resident     0.495097    0.292813    0.421775    0.315120    0.464933   

              ZNF680_(+)  ZNF708_(+)  ZNF770_(+)  ZSCAN26_(+)  
NK exhausted    0.406792    0.280230    0.280744     0.344619  
Others          0.273959    0.235954    0.237725     0.242763  
NK resident     0.388548    0.247604    0.277210     0.360779  

[3 rows x 219 columns]
               mean  StDev  Ratio
NK exhausted  0.449  0.075  0.464
Others        0.269  0.018  0.105
NK resident   0.439  0.071  0.431

All the specificity scores reach a max value ~ 0.55 and their activity doesn't cluster the cells in the heatmap. Also, the AUC look quite low. The low values could also derive from the high number of batches in the dataset. Anyway some TFs are interesting (like STAT).

extra, batches:¶

ARID3A_(+) ARNTL_(+) ATF1_(+) ATF2_(+) ATF4_(+) ATF6_(+) BACH1_(+) BACH2_(+) BCL11B_(+) BCL6_(+) ... ZNF576_(+) ZNF580_(+) ZNF587_(+) ZNF600_(+) ZNF655_(+) ZNF669_(+) ZNF680_(+) ZNF708_(+) ZNF770_(+) ZSCAN26_(+)
MGUS_CD138nCD45p_2 0.170896 0.169268 0.170189 0.170365 0.170055 0.170218 0.169829 0.170273 0.169811 0.171568 ... 0.170911 0.170567 0.170461 0.170509 0.169218 0.170486 0.175699 0.168928 0.167549 0.170142
MGUS_CD138nCD45p_3 0.204317 0.192854 0.198196 0.194469 0.194977 0.193858 0.194935 0.188656 0.192533 0.184365 ... 0.198002 0.197983 0.189419 0.194885 0.192884 0.198591 0.235459 0.173732 0.182398 0.187878
MGUS_CD138nCD45p_4 0.177391 0.173407 0.176083 0.175713 0.175022 0.174343 0.174629 0.174702 0.174425 0.171125 ... 0.176208 0.175900 0.173372 0.175044 0.175155 0.176817 0.185459 0.173322 0.169644 0.175739
MGUS_CD138nCD45p_5 0.176882 0.173296 0.175030 0.174044 0.174607 0.175083 0.174073 0.173334 0.173842 0.172365 ... 0.175379 0.175268 0.175146 0.173380 0.175684 0.175124 0.185886 0.169641 0.170601 0.173757
MGUS_CD138n_1 0.192502 0.185239 0.188397 0.189900 0.187138 0.189471 0.185122 0.184466 0.185889 0.182334 ... 0.188824 0.189817 0.180835 0.181638 0.182640 0.192862 0.213541 0.176922 0.182610 0.183272
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
SMM_CD138nCD45p_9 0.181477 0.177165 0.180174 0.181480 0.178789 0.178514 0.177638 0.177662 0.178149 0.179777 ... 0.182603 0.180533 0.179998 0.177006 0.175865 0.183166 0.198056 0.178603 0.176606 0.182598
SMM_CD138n_3 0.171903 0.169728 0.170921 0.171797 0.170510 0.169777 0.170556 0.170774 0.170349 0.170992 ... 0.171229 0.170837 0.170803 0.170697 0.169331 0.171389 0.173764 0.169179 0.170918 0.169807
SMM_CD138n_4 0.203920 0.188879 0.200221 0.199888 0.196458 0.190234 0.193584 0.195191 0.195148 0.193618 ... 0.201070 0.198847 0.186835 0.194454 0.192864 0.202913 0.238553 0.182198 0.184110 0.188803
SMM_CD138n_5 0.172469 0.170303 0.172419 0.173040 0.171650 0.170579 0.171879 0.172690 0.171502 0.172554 ... 0.173245 0.172636 0.170128 0.171631 0.171312 0.173533 0.179210 0.168251 0.170935 0.172364
SMM_CD138n_6 0.173069 0.172687 0.172206 0.172443 0.171678 0.171593 0.171539 0.171179 0.171610 0.172385 ... 0.172766 0.172447 0.169355 0.171222 0.170413 0.172914 0.178244 0.172118 0.172970 0.171701

76 rows × 219 columns

mean StDev Ratio
MGUS_CD138nCD45p_2 0.170 0.001 0.001
MGUS_CD138nCD45p_3 0.195 0.007 0.018
MGUS_CD138nCD45p_4 0.175 0.002 0.004
MGUS_CD138nCD45p_5 0.175 0.002 0.004
MGUS_CD138n_1 0.187 0.005 0.013
... ... ... ...
SMM_CD138nCD45p_9 0.180 0.004 0.006
SMM_CD138n_3 0.171 0.001 0.001
SMM_CD138n_4 0.197 0.008 0.019
SMM_CD138n_5 0.172 0.001 0.002
SMM_CD138n_6 0.172 0.001 0.002

76 rows × 3 columns

mean StDev Ratio
count 76.000000 76.000000 76.000000
mean 0.184039 0.004618 0.013158
std 0.021651 0.006751 0.023457
min 0.168000 0.000000 0.000000
25% 0.171750 0.001000 0.002000
50% 0.176000 0.002000 0.005500
75% 0.187500 0.006000 0.013250
max 0.285000 0.040000 0.145000